LGeRM: lemmatization of Middle French words
نویسندگان
چکیده
Unlike most modern languages, Middle French is a language whose spelling is not yet stabilized. There is a great deal of variation in the spelling of a word and accordingly the traditional methods for lemmatization cannot be used. LGeRM (lemmes, graphies et règles morphologiques) proposes a solution based on a databank containing known lemmatized spellings and a set of graphical and morphological rules specific to the medieval language. LGeRM can provide help in consulting a dictionary, browsing or lemmatizing medieval texts, and it can be useful in the electronic edition of manuscripts and the automatic construction of glossaries. This multipurpose tool is accessible on the Internet at www.atilf.fr/dmf. MOTS-CLÉS : lemmatisation, dictionnaire, glossaire, moyen français, ancien français.
منابع مشابه
Lemmatization and Lexicalized Statistical Parsing of Morphologically-Rich Languages: the Case of French
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags ar...
متن کاملAutomatic Lemmatizer Construction with Focus on OOV Words Lemmatization
This paper deals with the automatic construction of a lemmatizer from a Full Form Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-ofvocabulary (OOV) words. Three methods of lemmatization of three kinds of OOV words (missing full forms, unknown words, and compound words) are introduced. These methods were tested on Czech test data. The best r...
متن کاملWeigh your words - memory-based lemmatization for Middle Dutch
This article deals with the lemmatization of Middle Dutch literature. This text collection—like any other medieval corpus—is characterized by an enormous spelling variation, which makes it difficult to perform a computational analysis of this kind of data. Lemmatization is therefore an essential preprocessing step in many applications, since it allows the abstraction from superficial textual va...
متن کاملA global model for joint lemmatization and part-of-speech prediction
We present a global joint model for lemmatization and part-of-speech prediction. Using only morphological lexicons and unlabeled data, we learn a partiallysupervised part-of-speech tagger and a lemmatizer which are combined using features on a dynamically linked dependency structure of words. We evaluate our model on English, Bulgarian, Czech, and Slovene, and demonstrate substantial improvemen...
متن کاملUsing multi-terminology indexing for the assignment of MeSH descriptors to health resources in a French online catalogue
BACKGROUND To assist with the development of a French online quality-controlled health gateway(CISMeF), an automatic indexing tool assigning MeSH descriptors to medical text in French was created. The French Multi-Terminology Indexer (FMTI) relies on a multi-terminology approach involving four prominent medical terminologies and the mappings between them. OBJECTIVE In this paper,we compare le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- TAL
دوره 50 شماره
صفحات -
تاریخ انتشار 2009